Modeling the ribosome as a bipartite graph

Developing mathematical representations of biological systems that can allow predictions is a challenging and important research goal. It is demonstrated here how the ribosome, the nano-machine responsible for synthesizing all proteins necessary for cellular life, can be represented as a bipartite network. Ten ribosomal structures from Bacteria and six from Eukarya are explored. Ribosomal networks are found to exhibit unique properties despite variations in the nodes and edges of the different graphs. The ribosome is shown to exhibit very large topological redundancies, demonstrating mathematical resiliency. These results can potentially explain how it can function consistently despite changes in composition and connectivity. Furthermore, this representation can be used to analyze ribosome function within the large machinery of network theory, where the degrees of freedom are the possible interactions, and can be used to provide new insights for translation regulation and therapeutics.


Introduction
The ribosome, responsible for the coded synthesis of all cellular proteins, is the ultimate nanomachine. It is comprised of two particles, referred to as the small (SSU) and large (LSU) subunits. Each subunit contains ribosomal ribonucleic acid rRNA and proteins. An active ribosome is formed when a messenger (m)RNA binds to the SSU and then with the LSU. The mRNA, which has been copied from a gene in the DNA, encodes a protein. This protein product is synthesized one residue at a time, corresponding to three codons on the mRNA, while the ribosome undergoes a cyclic series of transitions that are facilitated by proteins known as elongation factors. The ribosome is thus fundamentally dynamic, constantly changing and modifying its shape throughout the process of elongation [1].
About twenty years ago, the first near-atomic resolution crystal structures were published [2,3], heralding in a new era for understanding the relationship between ribosome function and structure. However, there is not yet a synthetic mathematical representation that is able to succinctly describe ribosome structure. Such a framework would enhance the use of physical modeling, bringing a new understanding of translation and enabling new therapeutics.
In a previous study, a network model of the ribosome considered a representation as a unipartite graph [4], and explored changes in the ribosome at different stages in translation. The graphs were constructed by separating the ribosome into its molecular elements (proteins, rRNAs and rRNA domains for the largest molecules, mRNA, tRNAs) and determining the interactions by seeing whether a solvent molecule of 1.4 Å could be rolled between each two elements. However, a unipartite graph cannot account for the difference between elements interacting independently and those involved in the same interaction. This connectivity model is different from the much denser ribosomal networks considered previously, where the nodes consist of each residue/nucleotide. Contacts between nodes thus include neighboring residues on the molecule where they are found, as well as additional contacts formed by other residues [5][6][7][8]. The degrees of freedom of each node in these networks are subject to the collective constraints of the molecules. More traditional simulations of ribosome function initially used coarse grain models [9,10] or very short molecular dynamics simulations [11,12]. More powerful methods were also employed so as to overcome the difficulty of calculating such large atomic simulations by combining a minimal energy landscape with molecular dynamics simulations [13]. This class of models aims to understand the ribosome in its minute detail, including all possible physical phenomena (electrostatic, dynamic, chemical), but in relatively short time scales, and requires large computational capacity. The coarse grain models are computationally much less expensive but focus on a particular physical effect (elastic network).
The premise here is that the ribosome can be mathematically modeled as a connectivity network. To accomplish this, the connectivity of the ribosome is represented as a bipartite graph, B = (U,V,E), which contains two disjoint sets of vertices U and V and the set of edges E connecting them [14,15]. Fig 1 shows a graphical representation of the bipartite graphs of Thermus thermophilus and Saccharomyces cerevisiae, representative of Bacteria and Eukarya respective. Using this approach, the basic network characteristics of the ribosome are investigated and compared. To demonstrate its utility, we consider 10 high resolution structural files of Bacteria and 6 structural files of Eukarya. It is shown how the characteristics of this representation may provide new insights for ribosome evolution.

Methods and definitions
The use of bipartite graphs in biology and medicine are of increasing importance [14] and basic notions have been developed [15]. To construct the bipartite graph, the ribosome is first divided into individual molecular elements, which will constitute U, the first set of vertices. A natural division is to use the individual chains in the structural file, corresponding to individual proteins, rRNAs, mRNA and tRNAs. The ribosome consists of widely disparate RNA and protein elements. This creates a dilemma when creating a ribosomal graph as it is necessary to incorporate both the rRNAs and the ribosomal (r)proteins so that one can distinguish their different roles. Herein, the 16S (18S) and 23S (25S or 28S) rRNAs from Bacteria (Eukarya) are divided into homologous domains identified in the secondary structure [18] and denoted by D. All other elements are maintained as single nodes. Herein when ribosomal elements are mentioned, they refer to this first set of nodes in the bipartite graph that includes the individual elements or the individual domains of the rRNAs just mentioned. An example list of the ribosomal element nodes is provided in S1 Table. The residues involved in the domains can be found in the sample code of S1 File.
This representation could eventually be modified to include detailed secondary sub-structures of the rRNA such as helices and pseudo-knots, important functional centers such as pivot points [19], or divided into the evolutionary segment expansions [20,21] to explore different aspects of the ribosome. Similarly, the rproteins could be divided into globular components and extensions [22]. Networks of specific motifs such as those with pi-conjugated bonds that might permit charge transfer [23,24] could also be explored to consider its role in the functionality and communication in different parts of the ribosome.
The second set of nodes in the bipartite graph is the interactions formed between ribosomal elements. First, all the dyadic interactions v 12 2V formed between two elemental nodes u 1 , u 2 2U are calculated. This calculation is derived from initial work by Sobolev et al. and reviewed in ref [25] and is illustrated in Fig 2. A dyadic interaction here means that a solvent molecule with a radius of 1.4 Å cannot fit at some point in space between two ribosomal elements. The advantage of this method is that it involves relatively low intensive calculations to determine interactions, as now described. First, the Solvent Accessible Surface Area [SASA] for each residue of each ribosomal element is calculated independently from the rest of the ribosomal structure. The SASA consists of the surface area of each residue plus that of the small solvent molecule. Then, the total SASA for each set of two ribosomal elements is determined. If a specific residue of the SASA determined from the two element set is smaller than that of an isolated element, then its SASA is partially obstructed by the other element. An interaction is therefore present. The calculation of the SASA was done using the method of Shrake and Rupley [26], in pymol [27]. Extensions of this calculation can be envisioned to explore the nature of the interactions, for instance by varying the radius of the solvent or accounting for the type of molecules that interact. A sample code for this calculation is provided in S1 File sample code and the dSASAs are also provided in S1 and S2 Texts using the standard json format, for two representative pdb files. It should be noted that the names of the elements in these files are those found on the protein databank [28] but that comparisons made in later in the paper use the new nomenclature [29]. Visual representation of the bipartite network of (a) T. thermophilus from Bacteria [16] and (b) S. cerevisiae [17] from Eukarya. The squares represent ribosomal elements and stars the interactions. Green indicates elements and interactions in the LSU, where the peptide bond formation occurs. Magenta indicates elements and interactions in the SSU, where decoding takes place. Grey indicates inter-subunit bridges and orange indicates tRNAs. To emphasize domains 2 and 5 of 23(25)S rRNA, they are plotted respectively in black with the former twice as large as the other nodes to indicate its dominant centrality and the latter 50% larger. The positions of each node were obtained from the structural file by calculating the center of mass coordinates of the residues involved in the node. The three-dimensional coordinates were then projected into two-dimensions. https://doi.org/10.1371/journal.pone.0279455.g001

PLOS ONE
Next, it is determined whether each interaction contains further connectivity with other elements. This is done by considering all other dyadic interactions v 1x , v 2x that involve elements u 1 or u 2 . It is then determined whether any of the residues from each of these interactions overlap with the v 12 interaction. If they do, then a connection to the element u x is added to the interaction v 12 . Note that because the interacting residues are identified and not the individual atoms, an interaction surface implicates just the residues involved, and does not guarantee that the same atoms in the residue are involved. Fig 3 depicts some of the possible interconnections and how they are accounted for in the bipartite graph. To reduce redundancy, a dyadic interaction v 13 with residues in u 1 that completely overlap those in another interaction v 12 is removed from the set of interactions, while a connection, is drawn between u 3 and v 12 . This is as illustrated in Fig 3E, where the interaction between the yellow and red elements is contained in the connection between the purple-red interaction. A yellow-red interaction is therefore not present, but an edge is drawn between the red protein and the yellowpurple interaction. This typically eliminates just a few interactions involving just one or two residues. An example of the complete bi-adjacency matrices for two files are provided in S2 and S3 Table. An overview of the methodology is summarized in Fig 4. The overall function of the ribosome can then be analyzed in terms of its connectivity using network theory, where a renaissance of research in the past 20 years provides a large arsenal of techniques [30,31]. I consider 16 ribosomal bipartite graphs constructed from two representative structural files in the protein databank [28]: 10 from the Bacteria (T. thermophilus) [16,[32][33][34][35] and 6 from Eukarya (3 from S. cerevisiae, known colloquially as baker's yeast, and 3 from H. sapiens) [17,36]. The ribosomes were chosen because they represent various states of elongation as well as with and without antibiotics and elongation factors, as indicated in S4 Table. I first focus on the basic graph properties that illustrate the similarities in the topological nature of the bacterial and eukaryotic ribosomes to identify the distinguishing features in The SASA of each residue is calculated for each element in isolation from the other elements in the ribosome. The three residues depicted here for each element are labeled by their color (B for Blue or Y for Yellow), an identifying number and an 'S' indicating that the SASA is calculated for a single element in isolation from all others in the ribosome. The additional radii due to a solvent molecule (green) of 1.4 Å located at its the surface is also indicated. (b) The SASA of two elements together and the solvent molecule is then calculated in the same way. Note that the final identifying label of the residues is 'B', indicating that the SASA is calculated for both elements together. The figure illustrates when an interaction is not present the difference in the SASA from the two individual elements is equal to that of the SASA when the two elements are both present. (c) Here an interaction is observed because although the area of residue A Y2S = A Y2B , for the other residues: A Y1S > A Y1B and A Y3S > A Y3B .
https://doi.org/10.1371/journal.pone.0279455.g002 PLOS ONE ribosomal networks. A comparison of unipartite graphs obtained using dyadic connections and those resulting from projections of the bipartite graphs is also considered. Centrality and modularity are then explored, and the results are considered in the context of models of the evolution of the ribosome.
Calculations were carried out using python, Pymol [27], and NetworkX [37]. Table 1 summarizes the basic structural elements of the pdb files and the average basic properties of the resulting graphs. Detailed results for each file are given in S4 Table. The size of the bipartite graph B is the number of nodes n B . The number of nodes for each set: n u and n v are distinguished. The order m is the number of edges. The average degree of the graph, the degrees and the interactions are respectively: k B = 2m/(n u +n v ), k u = m/n u , and k v = m/n v . The average shortest path length l is the average number of nodes to connect two nodes on the graph: where d(s, t) is the number of nodes between elements or interactions s, t. A common analysis in network theory involves dividing a graph into communities based on a particular characteristic [38]. While there are many ways to do so, here communities are formed using an algorithm that finds the nodes that are more tightly connected to those in their community than to others, known as degree modularity. It is determined using the Clauset-Newman-Moore greedy modularity optimization algorithm [39]. A modularity score for each graph is a  Fig 2, the ribosomal elements are depicted by colors (Blue, Yellow, Red and Purple) and the individual circles represent residues. The black ovals indicate interactions. c = Depiction of how the interaction of residue B 3 with both Y and R leads to an additional interaction between Y and R in the unipartite projection onto the elements that would not be present in the dyadic graph. Additional interactions are captured in the bipartite graph, but may not appear in the unipartite projections, as depicted in (d). (e) Depiction of how triad interactions can lead to significantly more edges in the projection onto the unipartite interaction graph. There are 4 dyadic connections: Blue-Yellow, Blue-Red, Purple-Red and Purple-Yellow. The yellow-purple interaction is justified because there is a single residue that is not present in the purple-red interaction, but the red-yellow interaction is not present because it is part of the yellow-purple interaction. Note that the dashed lines in (e) are meant to indicate additional edges that would not be present if only binary interactions were allowed.
https://doi.org/10.1371/journal.pone.0279455.g003 From the information on the RCSB website concerning the pdb structural file, the elements are enumerated by their chains and the 16S and 23S are further divided into domains using the residue numbering found on the Ribovision website [18]. The dyadic interactions are then determined using pymol to calculate the difference between the solvent accessible area of each two elements separately and together. Next the residues in each dyadic interaction are compared with the residues of other dyadic interactions that contain the same elements. If any also include these residues a multiple interaction is said to exist.
https://doi.org/10.1371/journal.pone.0279455.g004 measure of how well a graph can be divided into communities and is defined as [40]: where B ij is the bi-adjacency matrix of the graph. A concept that is often explored in bipartite networks is the notion of nestedness [41]. It is particularly important for ecological networks because it distinguishes between 'generalists', which interact with many different elements, and 'specialists', which interact among a select few set of elements. The measure Nestedness Overlap and Decreasing Fill, NODF [42], is based on the idea that in a perfectly nested network, neighborhoods of nodes with lower degree are included in neighborhoods of larger degree. Consider a pair of nodes i, j in the same set such that k i > k j , then in a perfectly nested neighborhood their common neighbors The values in the table were taken from 10 Bacteria and 6 Eukarya pdb files whose individual values are given in S4A and S4B Table. Note that for centralization the number of times a node appeared for a given element is given. https://doi.org/10.1371/journal.pone.0279455.t001

PLOS ONE
N ij should be k j . The row and column NODF R,C are respectively: where the rows [elements] and columns [interactions] are from the bi-adjacency matrix. The NODF for the network is then defined as: To explore the topology of a bipartite network, a plot of the bi-adjacency matrix is plotted such that the set of nodes in the rows and columns are ordered by degree. A perfectly nested network exhibits a clear upper triangular pattern. A modular network will indicate clear blocks of modular sub-networks. Finally, a bipartite network can also have elements of both modularity and nestedness where the different blocks are dominated by a triangular structure.
To measure the importance of different nodes in the network, graph theoretical analysis uses centrality measures [25]. Degree centrality, which measures the number of connections made by a node u is: Closeness centrality, which measures how many nodes are necessary to connect to another node in the network is: where L(u 1 , u 2 ) is the shortest distance in terms of number of nodes between node u 1 and u 2 . Finally, betweenness centrality, which measures the positioning of a node on linking paths between other nodes in the network is: where σ st is the number of paths linking nodes s, t that pass node u. These notions can be extended to consider the importance of the single most important hub in the network using the notion of centralization [43]. For a centrality measure C, of graph G having nodes u it is defined as: The centralization allows for a comparison of different graphs, for instance the bipartite graph with its projections [14].
In order to understand the differences between the bipartite and unipartite graphs, the properties of unipartite graphs with dyadic connections are considered. In addition, projection unipartite graphs are created by projecting the bipartite graphs onto the interactions and elements. The latter is achieved by removing one of the sets of nodes and connecting the nodes of the remaining set by merging the edges from the removed set.
Clustering coefficients are meant to capture correlations between neighbors of connected nodes and therefore are vital to understanding graph topology. The definitions introduced in Latapy [15] are used here. Given two nodes w 1 and w 2 , where both nodes belong either to the elements U or the interactions V, the dot cluster coefficient is: where N(w) are the neighbors of node w and represent the correlation of the neighborhoods of the two nodes. The cluster coefficient for a node w 1 , which is the average of cc cc • (w 1 , w x ) for all x in the second nearest neighborhood of w 1, N(N(w 1 )) is: As with clustering coefficients in unipartite graphs, the dot cluster coefficient can then be calculated for all the elements and interactions, as well as for the bipartite graph: While the cluster coefficients explore neighborhoods of nodes, the importance of particular nodes can be ascertained using the notion of redundancy. The definition is strictly mathematical. It determines topologically the fraction of pairs of neighbors of a node u 1 that are linked to other nodes: Note that this calculation can only be done if the number of degrees is > 2. The redundancy coefficient of the elemental, interaction and entire graph can be obtained by again averaging over all constituents, including those that are pendant or isolated. This definition makes no assumptions about biophysical or evolutionary aspects of the connectivity.
The bipartite graphs in Fig 1 were generated from the positions found in the structural file. The placement of each element was calculated by averaging the position of the phosphate atoms in the RNA or the α-carbon atoms in the proteins. Each interaction position was found by calculating the average position of all the phosphates or α-carbons in the residues involved in the interaction. The coordinates where then projected onto a two-dimensional surface. Similarly, the same technique is used to draw the projections onto the elements and the interactions.

Graph topology
The visual representations of the two networks in Fig 1 show a high density of elemental nodes near the core overlapping region of the subunits and near the perimeter of the ribosome. The interactions are more evenly distributed throughout the structure. Fig 5 shows the projections of each bipartite graph onto the unipartite element and interaction graphs. It is observed that the elements near the subunit interface are predominantly rRNAs (triangles) and those along the edges are rproteins (circles). There is a direct correspondence between this representation and the three-dimensional atomic descriptions and analyses of ribosome structures [44].
The highly conserved nature of the different ribosomal elements across all domains of life has been used to explore the nature of the origins of life [45]. While there is not yet a consensus concerning rooting of the tree of life and its form [46], what seems clear is that Bacteria and Eukarya inherited the ribosome from a common origin [47]. Each then evolved with the eukaryotic ribosome becoming more complex. Understanding the differences in the two networks shown in Fig 1 may thus shed light on the different evolutionary paths of Bacteria and Eukarya and also the ribosome from which they originated. Table 1 shows how Eukarya has 30% more elements, 70% more interactions, and about double the number of interconnections. In addition, there are significantly more rproteins, which comprise approximately 50% by mass and many more rprotein-rprotein interactions [44]. These large compositional changes seem to imply that the network structure should exhibit significant differences in its characteristics. Nevertheless, it was found that many parameters, including the average path length, diameter, redundancy, degree centralization and modularity are remarkably similar. A priori, this suggests that the growth mechanisms from the common ancestral origin to the modern bacterial or eukaryotic ribosome were

PLOS ONE
similar. While the methodology here is much too coarse to make any assumptions about what such mechanisms are, one can nevertheless point out some common characteristics.
A distinguishing feature of the ribosome is its large modularity (~0.5) and small nesting (NODF~0.12). While the modular groups are discussed in more detail in the next section, it is noted here that they are divided along the lines of the subunits. This modular grouping is a consequence of the rarity of intersubunit bridges. It supports the hypothesis of different evolutions of the subunits, as has been discussed in research focusing on the evolution of the ribosome [48]. As a result, graph properties that depend on calculations involving the distance between different nodes are very similar in the two ribosomes. Specifically, the average path length and the diameter are similar even though the eukaryotic ribosome is more complex.
Another important similarity in the two graphs is the high centralization of the ribosome, which is dominated by the 23S /25S D2. To emphasize this node, it is displayed as the largest marker in black in Figs 1 and 5(A) and 5(B). Models of network growth tend towards the idea that the oldest elements of the network will have the most connections [40]. This is the central idea behind the notion of preferential attachment: the oldest elements make the most connections. It is used to explain the occurrence of scale-free networks [49].
The most ancient part of the ribosome is still a subject of debate. Many researchers believe that the origin of the ribosome is the peptidyl transferase center (PTC), found in 23/25S D5 [50], and shown to exhibit the largest connectivity within the 23S rRNA [51]. The hypothesis that the PTC is the oldest component of the ribosome seems to solve the chicken and egg problem: the possibility to synthesize large proteins must exist to synthesize the ribosome. Other researchers used phylogenetic methods to find that the oldest region was the decoding center (helix 44 in 16S/18S rRNA, corresponding to D3'm) and that the GTPase center in D2 was the oldest region in the 23S rRNA [52]. Recent work has shown that rRNAs might have initially encoded tRNAs and the key protein molecules involved in the ribosome [53], which opens up new possibilities for ribosome evolution. To explore the question of the oldest component of the 23S using graph theory, determination of the interactions in the secondary structure are necessary.
To explore connectivity in the networks further, Fig 6 plots the node distribution. As seen in Table 1, the largest degree nodes degree in both networks change depending on the state of the ribosome. Fig 6 shows the networks exhibit scale free behavior with exponents between 1.44 and 1.57 ( Table 2). Note that the CDF deviates from scale free behavior for very large degrees, which may have to do with the physical limitations of forming further connections. From this analysis it seems that preferential attachment likely played a role in network growth. Further discussion of centrality is found in section 3.3.
An important property of the networks is that the ribosomal elements can accommodate many more connections compared to the interactions. This is seen visually in the representations in Fig 5 where there is a large difference in sparsity between the elemental and interaction graphs. In Table 1, there is just a 10% increase in the average interaction degree when comparing Bacteria and Eukarya. This contrasts the 50% increase in the average element degree. To grow in complexity, the ribosome is more efficient in adding to its machinery and functionality by incorporating either new elements or new parts of elements (segment expansions for RNAs and new extensions for rproteins) as opposed to increasing the size of interactions. This mathematical characteristic is also perfectly consistent with models of ribosome evolution [54][55][56]. Smaller ancient RNAs and proteins are hypothesized to have gradually added substructures to form modern rRNAs and proteins. The different role played by the rRNAs and rproteins is an important distinguishing characteristic. The data in Table 1 shows that while the rRNAs make nearly four times more interactions compared to rproteins, their clustering coefficients are 15% smaller. The larger cluster coefficients in rproteins means that they have greater correlation with their neighbors compared to the rRNA domains. The clustering coefficient is smaller in Eukarya compared to Bacteria, suggesting again that making many connections is difficult when the number of elements is increased, consistent with recent research on phylogenetic accretion [56].
Another striking result is that the topological redundancy of rproteins is greater than 90% in both domains of life. Unlike the cluster coefficient, the redundancy increases with more elements. The higher cluster coefficient coupled with the higher redundancy of rproteins suggests that nature's solution to the interaction bottleneck may have been to use rproteins to provide 'short cuts' (perhaps via allosteric pathways) through the ribosome. To further explore such possibilities, though, investigations as to the biochemical pathways enabled by the interactions need to be determined. To do so, it is necessary to explore the nature of the residues involved in the interactions. For instance, one could consider hydrophobic interactions and consider whether they can act as flexible connectors between elements with the possibility of allowing a cascade of dynamic motion, a method that has proven quite effective in understanding the allostery in protein kinases [57]. One might also explore the possible role of charge transfer by considering regions of connectivity and elements where this could occur [23].  Table 2

PLOS ONE
This remarkably large mathematical redundancy is intriguing. First, it should be noted that this result does not imply there is a biophysical redundancy. What appears to be an equivalent connected path mathematically is likely to contain very different biochemical properties. They may also have evolved in different manners. Nevertheless, a topological redundancy suggests that alternative paths may enable the ribosome to be multi-functional. It could also permit enhanced resiliency, in the mathematical sense [58] where the absence or presence of different ribosomal elements does not impede ribosome function significantly. Table 3 compares the average basic properties of the three types of unipartite graphs for the different structural files and the data for each the individual structural files are given in S5A and S5B Table. The unipartite projection onto the elements has approximately double the number of connections in Bacteria and nearly three times as many in Eukarya compared to their dyadic counterparts. Consequently, the dyadic graphs appear highly dis-assortative, while the elemental and interaction graphs are not. In other words, in a unipartite representation, the dis-assortative nature may be due rRNA domains forming many more connections than the rproteins, but from the vantage of a bipartite network, we see that the formation of larger interaction surfaces involving both rRNA domains and rproteins makes the interconnectivity between the elements significantly more uniform. For the elemental graphs, this is a result of the three times larger increase in the degrees of rproteins compared to just two times in rRNAs. For the interaction graphs, it is once again the fact that the number of interactions is limited, causing a very large number of connections in the projection. By not considering

PLOS ONE
multiple interactions, the rproteins appear to make many fewer connections and to play a much less important role in the connectivity. The cluster coefficients are also much larger in the projected graphs. However, the rRNA is much larger in the elemental graphs compared to the dyadic graphs, suggesting that dyadic graphs will short change the role of the rRNAs in the connectivity: the bipartite network analysis confirms their important role in the connectivity of the network as we know must be the case from explorations of evolution [48,50,59]. A final observation is that the cluster coefficient of the rproteins is approximately constant in Bacteria and Eukarya. Although numerous researchers have remarked that the eukaryotic ribosome exhibits increased connectivity of rproteins [44,60], in the context of network theory, the change in connectivity is a natural consequence of a greater number of nodes and similar underlying growth mechanisms. Overall, the bipartite representation thus reveals a much larger connectivity than the unipartite networks. Small changes that occur spontaneously on the ribosome without changes in elements thus can be much more efficiently detected and explored.
In this sub-section some of the basic features of the ribosomal bipartite graphs were investigated. It was found that the highly modular nature of the ribosome has a strong impact on network measures. Another distinguishing characteristic is that the interactions are constrained in size, confirming the importance of accretion for increased functionality. Despite the importance of constraints, preferential attachment seems to be important for explaining the network connectivity. One remarkable feature is highly topological redundancy, which when considered with biophysical or evolutionary studies, may help explain ribosome resiliency and multifunctionality. Another important observation here is just how similar the graph properties of Bacteria and Eukarya are, despite the significant increase in elements in the latter. Finally, a comparison with the unipartite graphs shows the utility of the bipartite representation.

Modularity
To explore the modularity of the networks and the relation with centrality, we restrict the discussion to just a single file from T. thermophilus and S. cerevisiae. Fig 7 illustrates the important modularity for both the bacterial (a) and eukaryotic (b) ribosomes, determined using equation [2]. Fig 7I, plots the connectivity between elemental and interaction nodes, the biadjacency matrix, and colored rectangles delimit the modular groups. The first important observation is that the ribosome structure is not nested, as the upper triangular pattern is not observed. This result is true for all the pdb structures investigated. The modular groups connect with each other but there is not a hierarchical order. To show that the modules divide along the lines of the two subunits, the elements and connections in each colored box in (i) are drawn in this color in the 2D graphs in (ii). The elements of each modular subgroup are listed in (iii) in order of the number of connections (highest to lowest). Note that the elements in bold correspond to the large degree centrality, discussed in greater detail in the next section. In this particular example, the largest module in each subunit contains the elements thought to be most ancient universal components [59]: the LSU I (D5 and D2 of 23S/25S rRNA) or in SSU I (D3'm and CD of 16S/18S). Nevertheless, this observation would need to be confirmed by considering the connectivity in the specific helices in these regions. These results support the idea of separate evolutions of the subunits, which could explain the large modular connectivity. https://doi.org/10.1371/journal.pone.0279455.g007

PLOS ONE
As we have observed in previous research, [4] the modular groups change as a function of the different states. Typically, 23S-D2 and D5 are found together in the same module but 23S-D0 is found in a second LSU domain. Here we observe that the largest modular groups of T. thermophilus and S. cerevisiae in each subunit are similar, most likely because the two structural files are in the post-elongation state. The elements in the other modules can vary in significant ways, hinting at the important functional differences known to exist [44]. Differences in the LSU modules between the two species may be due to increased functionality necessary for different types of proteins that traverse the peptide exit tunnel. Some elements change modules, for instance uL4 appears to play different roles in the two domains of life. In Bacteria it is found in the LSU II and has fewer connections compared to Eukarya, where it is found in the LSUI. 23/25S D4 is present in the LSU II group in Bacteria but in the SSUII group in Eukarya. In fact, the SSU II modular group in S. cerevisiae has a significant number of elements from the LSU, demonstrating mathematically the increased importance of intersubunit bridges in Eukarya, as was observed in [60]. Another observation is that there are fewer modules in Eukarya, and they are necessarily larger. It would be interesting to compare the modularity at different stages of elongation in the two ribosomes to see whether this is a particularity of the post-peptide bond state or a generic feature of the two ribosomes.
The modular decomposition shows that the ribosome is predominantly divided along the two subunits. The largest modules in each subunit reflects what others have found to be the most highly conserved portions of the ribosome [61] and the most ancient parts, regardless of the model used. Further work would be needed to explore whether these conserved portions are involved in the interactions or not however, to gauge their importance for the graphical analysis. The modules are relatively independent, meaning that they do not exhibit a hierarchical structure relative to each other (small NODF). Further work could explore how the modularity changes with ribosomal state to gain insights into the multi-functionality of the different elements in a module. In addition, it would be interesting to investigate whether there is coevolution of elements within a module.

Centrality
Centrality measures explore the role of important ribosomal elements in the network. In S6A and S6B Table the top 10 centrality nodes for each of the structural files is reported. To place these results in context, Table 4 reports the average centrality of the top 5 centrality elements found in each of the files. Some elements such as tRNAs and the Elongation factors are only found in some of the files. Nevertheless, reporting the average value provides an idea of the overall importance.
The degree centrality may be the most interesting measure to explore insights into the evolution from a universal ribosome. The scaling law observed in Fig 6 and the notion that older nodes should have greater connectivity [48], is directly relevant for the nodes exhibiting the largest degree centrality. Nevertheless, there are many reasons why the oldest nodes might not be the most connected. First, it depends on the mechanism of ribosome growth. If growth is random, then each of the nodes should have on average the same number of connections. If growth started from a central core and then evolved to optimize different objectives, the starting central core would appear more connected relative to other nodes not in the central core if growth is not explosive.
Concerning the degree centrality in the LSU, we see that 23S-D2, which is involved in intersubunit bridges and the dynamic motion of the ribosome and contains the A, P and E sites, is by far the most important node, with an average score exceeding that of the next most important node (23S-D5) by~40%. In comparison, we see that while the homologous 25/28S D5 is the second most important node in Bacteria and S. cerevisiae, its advantage over the third most important node has significantly narrowed and indeed in H. sapiens it is the third most important node. Strikingly, the decrease in importance of 25/28S D5 correlates with an increase in importance of 25/28S D1. The latter plays an important role in the nascent protein exit tunnel and its increasing importance reflects its more important role in Eukarya, which is capable of synthesizing much more complex proteins. Similarly, we see that D0, found to be one of the 4 th most important nodes in Bacteria, is replaced by D3 (in position 5), which also plays an important function in the exit tunnel. These results suggest that D2 and D0 played an important role in the common core of a universal ribosome and that as more connections and elements are added their role relative the total functionality of the ribosome diminished.
An earlier paper explored connectivity within the 23S and found the number of connections (in parenthesis) from highest to lowest was found to be: D5 (36), D4 (32), D2 (31), D1 (17), D6 (14), D3 (8) [51]. The results presented here do not consider intraconnectivity, which may account for the discrepancy, or our method allows for many more connections to be observed. Investigations of ribosome evolution using A-minor interactions posit that the oldest parts of the ribosome are D5, followed by D0 [20]. Research exploring the number of conserved residues in the different domains of 23S also found that D5 has the most conserved residues, followed by D4 and then D2 [61]. To explore this further, investigations of the network need to consider the connectivity of the individual helices to explore the degree centrality of their interactions.
The domain in the SSU with the largest degree, the central domain (CD), has an even larger value in Eukarya than in Bacteria. The CD contains the central pseudo knot, which provides and important structural core for the small subunit. The other domain in the top 5 is the 3'M domain which contains P-site, both of which realize essential connections for ribosome function. Its score is just slightly smaller in Eukarya. The h1, h2 in the 5' domain, h28 in the 3'M domain, and part of h44 and h45 in the 3'm domain are thought to be the oldest parts in the accretion model [59] and in the phylogenic model h44, h11 (5' domain) and h34 (3'M domain).
No rproteins appear in the top 5-degree centrality hubs. Those that are the most fundamental during assembly are hypothesized to be the most ancient, namely, uL2, uL3 and uL4 [48]. Nevertheless, other analyses of the evolution of the ribosome rRNA have used A-minor interactions [46,50] and phylogenic analysis [52]. A recent extension of the accretion model of ribosome evolution proposed that the correlation between the interactions between rproteins and rRNAs could be used to determine the age of the rproteins. The oldest rproteins where thus hypothesized to be: uL2, uL3, uL15, uL4, uL13, uL14, uL16 [62]. The rprotein with the highest degree centrality in Bacteria is uL3, which has an average score of 0.081, followed by uL15, with an average score of 0.072, with 24 connections. For both yeast and human, the rprotein with the highest degree is eL15 [score of 0.089 and 0.09 respectively], followed by uL4 for yeast (score 0.087) and uL2 for human (score 0.071). It should be noted that in yeast the uL2 rprotein has a similar score as in human (0.073). It is thus quite striking that the most dominant rproteins are not the same in the different domains and even the different species. This result is very consistent with the hypothesis that rproteins were an addition to an already relatively developed rRNA dominated ribosome.
Degree connectivity is now considered in the context of the modularity from Sec. 3.2. Fig 7  (iii) lists the elements of the modules from highest to lowest degrees. One can observe that 23S D2 and 23S D5, as well as uL3 are all found in the LSUI module in Bacteria. Likewise, it is observed that 16S (18S) -CD is in the SSUI module containing the decoding center (16/18S 3'm) and 16(18)S 3'M. In both the small and large subunit, the largest connected nodes are thus in the largest module which are relatively similar in the two domains. It could be that the ribosome evolved additional regions of high connectivity because the functional regions were saturated and could not develop additional connections. This could explain why the intradomain connectivity of 23/25/28S D5 is larger than 23/25/28S D2 [51].
Betweenness centrality measures how much a particular node is on the link between paths connecting two other nodes. 23/25S/28S D2 of is also the dominant node here. While 23S-D5 is the second most important in Bacteria, like in the degree centrality, its score significantly reduced in Eukarya, but in favor of a different element, 18S-CD. In Bacteria we observe that EF-G appears in the betweenness centrality, emphasizing its importance. We also find tRNA-P, the hybrid state and tRNA-E. The first rproteins to appear in the betweenness centrality are involved in the first interaction to appear: uS13_uS19 (score 0.048), followed by uL2_23S-D4 (score 0.038).
Closeness centrality is a measure of how close a node is to connecting to all other nodes in the network. Strikingly, this is the only centrality measure where interactions appear in the top five and where rproteins appear. This result is consistent with the much higher redundancy of rproteins that was observed in Section 3.1 and supports for the idea that rproteins may allow mathematical short cuts through the ribosome. The dominant interaction in both ribosomes is the same: interaction between 23 (25)S D4 and the rprotein uL2, neither of which appear alone in the top centrality hubs. Both uL2 and 23/25/28S D4 play an important role in connecting the two subunits via their inter-subunit bridges. Not surprisingly, this interaction appears as the dominant one for degree centralization in both graphs in the projection onto interactions (Table 3). It is striking nevertheless that uL2, which is one of the assembly proteins in Bacteria and thought to be one of the oldest, appears in the most important interaction. Another important observation is again the reduced importance of 23/25/28S-D5 in Eukarya as compared to Bacteria. This observation is consistent with structural analysis of the Eukarya [44,60]. Nevertheless, the significant differences in the number of connections of the rproteins is likely to be strongly influenced by the very different assembly in Eukarya compared to Bacteria [63].
The analysis has considered generalized regions, not the specific structures, that are hypothesized to be the oldest parts of the ribosome. A more careful analysis might explore the specific connectivity of the different hypothesized oldest regions and whether they include conserved residues. Nevertheless, the analysis, along with that of previous work [4] does allow some relatively broad observations. First, the evolution from the universal ribosome appears to be constrained in the development of new interactions. It is also observed that the more highly connected Eukarya network exhibited a decrease in the centrality scores for 23/25/28S D5. This is likely due to specialization: with increasing complexity of the ribosome new regions develop more functionality, reducing the relative importance of older ribosomal components relative to the entire ribosome. It is therefore not inconsistent with this mathematical analysis that 23/25/28S D5 could be the oldest component of the ribosome.

Conclusion
This research introduced a mathematical representation for modeling ribosome structure that can succinctly capture many characteristics. The similarity of the graph properties indicates that the growth mechanisms of Eukarya and Bacteria starting from a common ribosome were similar. The scale free behavior of the CDF of nodes suggests that preferential attachment is at least in part played a role in network growth. It was found that the properties of the ribosome were strongly influenced by two topological properties: 1) a limitation of the size of the interaction surfaces available to extend the connectivity of the network, 2) the important functional division between communities in the LSU and SSU. The importance of 23/25/28S D5, which decreases in more complex networks, suggests that this domain may be one of the oldest. Further, the importance of different rproteins in the two different domains and two species supports the hypothesis that they were added after an rRNA core already existed.
The limitations on the size of the interaction surfaces, and the physical constraints of the ribosomal environment suggest that connectivity may be the route by which the ribosome is able to increase functionality. One of the most striking features of the analysis is the very large redundancies that are present in the networks, which may provide a key to understanding how the ribosome can function despite changes in composition and connectivity. The presence of this large mathematical redundancy should also inspire new types of architectures for realization of more robust man-made networks, for instance in technological and electronic architectures, that could significantly reduce network failure.
Using interactivity as a means for mathematically modeling complex biomolecules is a promising technique for further explorations. Such explorations could consider modifying the definitions of the elements to provide finer details, such as considering the secondary structure of the rRNAs or the rproteins. More importantly, though, the mathematical representation used in the modeling here does not take into account the biochemical underpinnings of the interactions. Combining the interactivity with the biochemical nature of the interactions is likely the most important extension of this work and would allow a deeper understanding of both the allostery present during ribosome function and the ribosome evolution.
Supporting information S1 File. Sample code for calculating interactions using pymol. (TXT) S1 Text. dSASA of the 5OT7 (T. thermophilus) file in json format. The information for each interaction is provided for each interacting residue in the following format (partner_name, residue_number, dSASA). (JSON) S2 Text. dSASA of the 6t7i (S. cerevisiae) file in json format. The information for each interaction is provided for each interacting residue in the following format (partner_name, resi-due_number, dSASA) (JSON) S1 Table. List of elements in the two pdb files considered in this paper. 5OT7 corresponds to T. thermophilus and 6t7i corresponds to S. cerevisiae. (PDF) S2 Table. Bi-adjacency matrix for pdb file 5ot7 (T. thermophilus). (PDF) S3 Table. Bi-adjacency matrix for pdb file 6t7i (S. cerevisiae). (PDF)